荷兰专利NL2026738A Cooperative-optimization control method of charging station based on double-center q-learning method

专利PDF首页>>荷兰专利

专利附录

专利说明

权利要求

类似技术

同族专利

引用文献

法律状态

优先权

专利摘要:
The present invention discloses a cooperative-optimization control method of a charging station based on a double-center Q-learning method, including: 1, describing a control process of charging service requests of two charging forms of electric vehicles which arrive randomly 5 as an event-driven decision-making process; 2, describing a process of controlling the electric vehicle which is charged in the charging station to respond to a power-grid peak regulation electricity price plan as a sequential decision-making process; 3, taking a peak regulation electricity price and the online service state of a charging point as a system state, 3, taking the fact that the electric vehicle arrives and makes the service request as an event, and selecting 10 whether the electric vehicle is admitted and provided with charging service or not as an admission control action, 4, at the epoch when the peak regulation electricity price is issued, selecting charging and discharging actions of all the AC charging electric vehicles which are served as a peak regulation control action, and 5, performing online cooperative optimization on an electric-vehicle admission control center and a control center for peak regulation 15 response of a system with a Q-learning algorithm. With the present invention, effective electric- vehicle intelligent admission control and peak regulation response control may be performed on the charging station, thereby adapting to peak regulation demands of a power grid. 1
公开号:NL2026738A
申请号:NL2026738
申请日:2020-10-23
公开日:2021-08-18
发明作者:Tang Ziyu；Tang Hao；Zhao Chuanxin；Fang Daohong；Fang Mingxing
申请人:Univ Anhui Normal；Univ Hefei Technology；
IPC主号:

专利说明:

[0001] [0001] The present invention pertains to the technical field of intelligent control and optimization, and particularly relates to a cooperative-optimization control method of a charging station based on a double-center Q-learning method.BACKGROUND
[0002] [0002] At present, China is the largest vehicle consumption market all over the world, vehicle manufacturers shift research, development and production emphases from vehicles powered by traditional energy to new energy vehicles, and electric vehicles are the mainstream of development of the new energy vehicles in a long period of time, and have huge consumption potential and an increasing market share. Charging points are important infrastructures for providing charging service for the electric vehicles, and also are an important link in the industrialization and commercialization process of the electric vehicles. With a rapid development of the electric vehicle industry and a great increase of market holdings of the electric vehicles, a charging station where centralized management and operation are performed on a plurality of charging points will be an important business mode and service form in the future. In addition, with an increase of the permeability of new energy, such as wind electricity, photovoltaic energy, or the like, as well as a mature development of an interaction technology (V2G technology) between the electric vehicles and a power grid, the intelligence and the adaptability of electricity production and service will be improved in the future, and effective management and guidance of electricity consumption of a power consumer, such as the charging station, or the like, will be a trend. For example, a dispatching center at each level may make an electricity peak regulation plan according to source-load prediction data and issue a real-time electricity price, thereby guiding the power consumer, for example, the charging station for the electric vehicle, to consume electricity reasonably and perform V2G electricity feedback, and promoting an autonomous peak shaving or peak shifting action at the consumer side.
[0003] [0003] A time-of-use electricity price mechanism which is quite simple and fixed is adopted for the existing power-grid electricity price, a power-grid peak regulation
[0004] [0004] In order to solve the defects in the prior art, the present invention provides a cooperative-optimization control method of a charging station based on a double- center Q-learning method, so as to online cooperatively optimize admission control for a service request of an electric vehicle of an admission control center and peak regulation control of a control center for peak regulation response in the charging station, thereby improving the running economy of the charging station and adapting the power-grid peak regulation demand.
[0005] [0005] In order to solve the technical problems, the following technical solution is adopted in the present invention:
[0006] [0006] the cooperative-optimization control method of the charging station based on the double-center Q-learning method according to the present invention is characterized by being applied to a service system of the charging station, which is provided with Jo DC charging points, Ja AC charging points and Jap AC and DC- hybrid charging points and provides paid charging service for Mb DC fast-charging electric vehicles which arrive randomly and Ma AC slow-charging electric vehicles
[0007] [0007] each DC charging point is enabled to meet charging power demands of the Mp DC fast-charging electric vehicles, each AC charging point is enabled to meet charging power demands of the Ma AC slow-charging electric vehicles, each AC and DC-hybrid charging point is enabled to meet the charging power demands of the Mp DC fast-charging electric vehicles and the MA AC slow-charging electric vehicles, and one charging point is enabled to provide the charging service for only one electric vehicle at a time; CSP‚CSP CSP, CSP
[0008] [0008] the JD DC charging points are denoted as °° Jo e Co CSP, CSy,-+, CS, CS respectively, the Ja AC charging points are denoted as ! : ! Ta respectively, and the Jap AC and DC-hybrid charging points are denoted as CSP, CSP, SPP, CSD b : : 2 PI Vo respectively; CS represents the jth DC charging 1 = 2... : : point, JC Dr Di, = 1.2. Jp} represents a set of codes of the DC charging points, A : 12. CS; represents the jth AC charging point, J Pi Di, = il, 2 Ja} represents a set … CSfP of codes of the AC charging points, “+ represents the jth AC and DC-hybrid charging ; _ ee y point, Je Disp, and Dip = 12, Jan} represents a set of codes of the AC and DC- hybrid charging points;
[0009] [0009] the charging power demands of the Mp pc fast-charging electric PP p>... PP... PP / vehicles are denoted as 2 0 Mo ‚ and the charging power demands of the | | | PA PA PA... PA D Ma AC slow-charging electric vehicles are denoted as ~~ => 0 Ma. Pn represents the charging power demand of the mth DC charging electric vehicle, 12 m € Du, ‚ Dat = t,2, Mp} represents a set of codes of types of the DC charging
[0010] [0010] it is assumed that the power-grid peak regulation electricity price is periodically issued according to a dispatching instruction, K is the number of issued
[0011] [0011] it is assumed that the Mt th electric vehicle randomly arrives at the charging station at the epoch t to apply for charging service, and my € Oy, UDM, „if the current state of charge (SOC) of a battery of the M! th electric vehicle is SOC, (1) the arrival event of the ™t th electric vehicle is denoted as E(m, SOC, (©) ;
[0012] [0012] the combined state of the three types of charging points at the epoch t is denoted as C= ICS. cs csi , CSP = (CSP (t), CSP (t) CS (t) CSP, (1) nd CS) (9= (m (6).50C,p () represents the service state of the jth DC charging point; my (®) represents the type of the electric vehicle which is served by the jth DC charging point CS) at the epoch t, my’ (1) =0 indicates that no vehicle is admitted at the jth DC charging point CS} at the epoch t, and mj (t) Das, indicates that the jth DC charging point CS) is charging one electric vehicle in Pwo : SOC np (1 represents the current SOC of the battery of the my (1) th electric vehicle which is served by the jth DC charging point CS} at the epoch t; 0013] CSf = (CSP (1),C83 (1), CSP (t), CS}, (1) | nd cs; (t)= (m5 (9-50C () represents the service state of the jth AC charging point; mj (0) represents the type of the electric vehicle which is served by the jth AC charging
[0014] [0014] t ( 1 ( ) 2 ( ) J ( ) me ( )) , and CSP (1) =(mf” (£),50C,m (1) ° ’ 3 represents the service state of the jth AC and DC- : Co mi Co LL hybrid charging point; represents the type of the electric vehicle which is served AD AD(t}— by the jth AC and DC-hybrid charging point CS at the epoch t, m; (6) =0 indicates
[0015] [0015] the state of the service system of the charging station at any epoch t is — 1 denoted as *! tt CPR} :
[0016] [0016] the epoch Tk of issuing the kth peak regulation electricity price PR, is taken as a decision-making epoch for peak regulation of the control center for peak regulation response, and the energy exchange directions of all the AC charging electric vehicles admitted by the service system of the charging station are denoted as actions dk at the decision-making epoch for peak regulation, di ={(a (0) at (2), df (3). df (34). (A (01), dE (2). (5) (J an))} df ()eDe=(-101 ied, dP()eDi={-L01 ig je Pio. represents discharging peak regulation, O represents no charging and discharging action,
[0017] [0017] it is assumed that the DC charging electric vehicle admitted into the service in the service system of the charging station does not participate in peak regulation, and the AC charging electric vehicle admitted into the service may participate in peak regulation;
[0018] [0018] at the kth decision-making epoch 7% for peak regulation, if AD dP (iy =1 mj (0) € Pay in the jth AC and DC-hybrid charging point, "© (J) , and if AD (1) — AD (yg : Ary — mj (1) =0 , di (9) ‚ and JC Pro. if Mi ()=0 in the jth AC charging point, A; =0 TL (3) ‚and JC Pr.
[0019] [0019] at the kth decision-making epoch 7% for peak regulation, a set of feasible Ak peak regulation control actions dk of the charging station is defined as Dr | and ko A Dr Dr wherein Ds is a Cartesian product of Ja +JaD gets Dr of peak regulation control actions; E(m, SOC, (t
[0020] [0020] the occurrence epoch t of the arrival event ( to an { ) of the Mt th electric vehicle is taken as a decision-making epoch for admission control of the admission control center for the electric vehicle, and event information of the decision- making epoch for admission control and state information of the service system of the charging station are combined and defined as an event-extended state sy = {t, Ct, PR,, Mt, SOC, ()} .
[0021] [0021] at the decision-making epoch for admission control, whether the service system of the charging station admits the electric vehicle and provides the charging service is denoted as an admission control action a, and the action at the nth decision- . : 10.1 : making epoch Ta for admission control is denoted as 3», and 32 © Da =10.1} , wherein 0 represents service refusal, 1 represents service admission, and Da represents a set of actions of the admission control center;
[0022] [0022] at the nth decision-making epoch Th for admission control, if the type Mt € Du, of the arriving electric vehicle, all the DC charging points and all the AC and
[0023] [0023] the cooperative-optimization control method of the charging station based on the double-center Q-learning method is divided into electric-vehicle admission control and peak regulation response control;
[0024] [0024] in the electric-vehicle admission control, a control process of charging service requests of the DC charging electric vehicle and the AC charging electric vehicle which arrive randomly is described as an event-driven decision-making process, the electric vehicles arriving and making the service requests is taken as an event, the peak regulation electricity price and the online service state of the charging point are taken as the state of the service system of the charging station, when the event occurs, the event information and the state information of the service system are combined into the event- extended state, whether the electric vehicle is admitted and provided with the charging service or not is selected as the admission control action, sample data feedback is thus obtained, a Q-value table for admission control is updated with the Q-learning method, and finally, a strategy table for admission control is obtained,
[0025] [0025] in the peak regulation response control, a process of controlling the electric vehicle which is charged in the charging station responds to a power-grid peak regulation electricity price plan is described as a sequential decision-making process, and at the epoch when the peak regulation electricity price is issued, the charging and discharging actions of all the AC charging electric vehicles which are served are selected as the peak regulation control actions according to the state of the service system of the charging station, sample data feedback is thus obtained, a Q-value table for peak regulation control is updated with a Q-learning method, and finally, a strategy table for peak regulation control is obtained.
[0026] [0026] The cooperative-optimization control method of the charging station based on the double-center Q-learning method is also characterized in that the admission control of the electric vehicle includes the following steps:
[0027] [0027] step 1: defining and initializing an exploration rate of the admission control action at the nth decision-making epoch In for admission control as €a and letting 0 <&n <1.
[0028] [0028] defining elements in the Q-value table for admission control as discretization event-extended state-action pair learning values, and initializing the elements in the Q-value table for admission control;
[0029] [0029] defining a current greedy strategy table V for admission control as a set formed by actions corresponding to the maximum discretization event-extended state- action pair learning value of each row in the Q-value table for admission control;
[0030] [0030] step 2: initializing t= 9 and n =1; assigning the current exploration rate a for the admission control action to an initial exploration rate ê; assigning the current p going greedy strategy table V for admission control to an original strategy table V0;
[0031] [0031] step 3: at the Nth decision-making epoch Th for admission control of the . . . . E{m, SOC, (t service system of the charging station when the arrival event ( b m ) occurs, observing the current state St of the service system of the charging station to form the sc event-extended state °!;
[0032] [0032] denoting the discretization state corresponding to the event-extended € state St of the Nth decision-making epoch Tu for admission control in the Q-value table as Sh <
[0033] [0033] denoting the action which is actually taken in the event-extended state 5! c at the Dn th decision-making epoch Ta for admission control as VS!) , wherein v(s{) € Da. e
[0034] [0034] in the event-extended state St at the nth decision-making epoch Ta for ce admission control, extracting a greedy action in the discretization state 52 corresponding sy oo v(sh) to >t from the Q-value table and denoting it as >»; Cc
[0035] [0035] in the event-extended state Si at the nth decision-making epoch Tn for . . . my © Dy FE . . admission control, if the type © of the arriving electric vehicle, all the DC
[0036] [0036] after the admission control center of the charging station takes the action v(st) : 7 Ln (st, vsf), 7) tJ observing and obtaining a system transition sample track transited from the nth decision-making epoch Ta for admission control to the n+ th decision- making epoch I= for admission control or the epoch T , wherein t=Tn t = laa <T © ro r_ ={T PR,0,0 ‚or t =T:; when t'=T letting st ={T,C1,PR,,0, }. € © <
[0037] [0037] step 4: observing and calculating the combined quantity rise, visi).sy) of charging rewards and peak regulation rewards obtained in the state transition process Aat of the service system of the charging station from the current action visi) taking state e _ v STIL Cn PRM, SOC (OF 44 the n th decision-making epoch In for admission c _ 1 ‘ control to the state SU Tt Cr PR, mi, SOC. (1)} gt the n+1 th decision-making epoch Tost for admission control or the epoch T;
[0038] [0038] step 5: updating the discretization event-extended state-action pair < 7 € . . e . . . . € learning value RG: VS:) for taking the action YO) in the discretization state Sn © corresponding to $t in the Q-value table for admission control by using a difference formula and a Q-value updating formula shown in Equ. (1) and Equ. (2), and assigning Cc c the value to QS: V(s0)) : d(st, v(st), sr) =r(st, v(st), st) + max Q(Sn+1,2) —Q(sh, v{st))
[0039] [0039] aeD (1)
[00409] [00409] Q(sn, v(st)): = Q(sn, v(st)) + y(sn, v(si)d(st, v(st), st) (2)
[0041] [0041] wherein in Equ. (1), 11°) represents the discretization event- extended state-action pair learning value for taking the action 2 in the discretization € € . . . state Sn+ corresponding to the state 3¢ of transition to the n+1 th decision-making epoch Tot for admission control or the epoch T;
[0042] [0042] in Equ. (2), the operator ": =" indicates that the value of the right formula . . . (ss v(st)) . . is calculated first and then given to the left variable; >» *>t7 is a learning step length c c for taking the action V(S1) in the discretization state Sn at the nth decision-making epoch Tu for admission control;
[0043] [0043] step 6: selecting the action corresponding to the maximum discretization event-extended state-action pair learning value of each row in the updated Q-value table for admission control to form the current action set for admission control, taking the current action set as the updated greedy strategy table for admission control, and assigning it to the current greedy strategy v for admission control, degrading the exploration rate ©» | thereby obtaining the updated exploration rate and assigning it to Ent:
[0044] [0044] step 7: if U <T assigning n+l to n, and returning to the step 3; otherwise, indicating t’= T, and performing step 8; and
[0045] [0045] step 8: judging whether the strategy table V for admission control is equal to Y° or not, if so, stopping updating and performing admission control on the random charging service requests of the M electric vehicles with the current strategy table v for admission control, otherwise, returning to the step 2 for execution;
[0046] [0046] the peak regulation response control includes the following steps:
[0047] [0047] step -1: defining and initializing an exploration rate of the peak regulation control action at the kth decision-making epoch 7x for peak regulation control as Ek and letting 0 <2 <1.
[0048] [0048] defining elements in the Q-value table for peak regulation control as state-action pair learning values of the service system of the charging station, and initializing the elements in the Q-value table for peak regulation control;
[0049] [0049] defining a current greedy strategy table V for peak regulation control as
[0050] [0050] step -2: initializing t= 0 and k= 0: assigning the current exploration rate tk for the peak regulation control action to an original exploration rate 0: assigning the current greedy strategy table V for peak regulation control to an original strategy table Vo.
[0051] [0051] step -3: at the kth decision-making epoch Tk for peak regulation control of the service system of the charging station, observing the current state 3t of the service system of the charging station;
[0052] [0052] denoting the discretization state corresponding to the system state St of the kth decision-making epoch Tk for peak regulation control in the Q-value table for peak regulation control as Sk;
[0053] [0053] denoting the peak regulation control action which is actually taken in the system state St at the kth decision-making epoch 7% for peak regulation control as Ws) ‚ wherein Ms) € Dr.
[0054] [0054] in the system state St at the kth decision-making epoch *k for peak regulation control, extracting a greedy action in the discretization state Sk corresponding to the current state St from the Q-value table for peak regulation control and denoting it as VK),
[0055] [0055] in the system state St at the kth decision-making epoch 7% for peak regulation control, randomly selecting an action Va from the feasible action set Dr according to the current exploration rate Èk for peak regulation control and assigning the action to (st), and assigning (st) to VS) with the probability I= Ek.
[0056] [0056] after the control center for peak regulation of the charging station takes the action st) , observing and obtaining a system transition sample track (se Visi), St) transited from the kth decision-making epoch 7% for peak regulation control to the
[0057] [0057] step -4: observing and calculating the combined quantity Hs, V(s), St) of charging rewards and peak regulation rewards obtained in the state transition process of the service system of the charging station from the current action vist) taking state St at the kth decision-making epoch 7% for peak regulation control to the state 5t at the (k+1)th decision-making epoch Tk+1 for peak regulation control;
[0058] [0058] step -5: updating the discretization state-action pair learning value Qs VS) For taking the action St) in the discretization state Sk corresponding to St in the Q-value table for peak regulation control by using a difference formula and a Q-value updating formula shown in Equ. (3) and Equ. (4), and assigning the value to Qs, (sn). d(st V(t), st") = 181, V(8¢), st) + max Q(Sk+1,d) —Q(s, V(st))
[0059] [0059] d<D; (3)
[0060] [0060] Olst. V(s0)): = Olst. vst) + (sk, v{s))d(s, (so), st) (4)
[0061] [0061] wherein in Equ. (3), Q(sk:1,d) represents the discretization state-action pair learning value for taking the feasible action d in the discretization state Sk: € corresponding to the state St of transition to the (k+1)th decision-making epoch Tk+1 for peak regulation control,
[0062] [0062] in Equ. (4), the operator ": =" indicates that the value of the right formula is calculated first and then given to the left variable; V8) is 4 learning step length for taking the action WS) in the discretization state Sk at the kth decision-making epoch Tk for peak regulation control;
[0063] [0063] step -6: selecting the action corresponding to the maximum discretization state-action pair learning value of each row in the updated Q-value table for peak regulation control to form the current action set for peak regulation control, taking the current action set as the updated greedy strategy table for peak regulation control, and assigning it to the current greedy strategy V for peak regulation control; degrading the
[0064] [0064] step -7: if k<K assigning k+1 to k, and returning to the step -3; otherwise, performing step -8; and
[0065] [0065] step -8: judging whether the strategy table V for peak regulation control is equal to Vo or not, if so, stopping updating and performing peak regulation control on the AC charging electric vehicles served by the charging station with the current greedy strategy table V for peak regulation control, otherwise, returning to the step -2 for execution;
[0066] [0066] Compared with the prior art, the present invention has the following beneficial effects.
[0067] [0067] 1. In the present invention, the epoch when the power-grid peak regulation electricity price is issued is taken as the decision-making epoch for peak regulation of the control center for peak regulation response, the energy exchange directions of all the AC charging electric vehicles admitted by the service system of the charging station are taken as the decision-making actions, decisions are made according to the system state including the starting epoch of a peak regulation period, the real-time state of the charging points in the system and the current power-grid peak regulation electricity price, and the starting epoch of the peak regulation period and the current power-grid peak regulation electricity price are taken as part of the system state, thus facilitating reflection of the time sequence characteristic of peak regulation of the power grid, enabling the control strategy to adapt to the peak regulation demands of the power grid and better conform to actual situations, and improving the feasibility of the method.
[0068] [0068] 2. In the present invention, the power-grid peak regulation electricity price and the online service state of the charging point are taken as the state of the service system of the charging station; the charging service request of the electric vehicle which arrives randomly is taken as the event; the random event and the state of the service system of the charging station are combined into the event-extended state; whether the arriving electric vehicle is admitted into the charging station to be provided with the charging service is taken as the system action; the epoch when the charging service request of the electric vehicle arrives randomly is taken as the decision-making epoch for admission control; the intelligent admission control process of the electric vehicle at
[0069] [0069] 3. In the present invention, admission of the electric vehicle of the charging station is intelligently controlled and optimized with a Q-learning method of the electric-vehicle admission control center, and the energy interaction between the service AC electric vehicle of the charging station and the power grid is intelligently controlled and optimized with a Q-learning method of the control center for peak regulation response, compared with a theoretical solution method, in the present invention, a complete mathematical modeling process is not required to be performed on a control system, and particularly, the random characteristics in the system are not required to be modeled precisely. With the present invention, a better control strategy may be obtained by observing running samples of the system to perform a real-time online learning process. In addition, when random parameters of the system change, operators are not required to modify an algorithm, the online learning process may still be performed according to the actual running process of the system, and a better intelligent admission control strategy of the electric vehicle may be obtained adaptively; particularly, the double-center Q-learning method in the present invention solves the asynchronous decision problem in the cooperative-optimization control of the charging station and overcomes the defects of a centralized synchronous decision method.
[0070] [0070] 4. The cooperative-optimization control method of the charging station based on the double-center Q-learning method according to the present invention is also suitable for the situation where charging prices are different in different periods of time and the situation where the power-grid peak regulation electricity price is issued non- periodically (or randomly).BRIEF DESCRIPTION OF THE DRAWINGS
[0071] [0071] Fig. 1 is a flow chart of an electric-vehicle admission control center in a
[0072] [0072] Fig. 2 is a flow chart of a control center for peak regulation response in the method according to the present invention; and
[0073] [0073] Fig. 3 is a schematic diagram of a service system of a charging station according to the present invention.DETAILED DESCRIPTION
[0074] [0074] In this embodiment, as shown in Fig. 3, a cooperative-optimization control method of a charging station based on a double-center Q-learning method is applied to a service system of the charging station, which includes Io DC charging points 1, Ja AC charging points 2, Jap AC and DC-hybrid charging points 3, Mp DC fast-charging electric vehicles 4 which arrive randomly, Ma AC slow-charging electric vehicles 5 which arrive randomly, a power-grid peak regulation electricity price plan 6, an admission control center 7 and a control center 8 for peak regulation response;
[0075] [0075] each DC charging point is enabled to adaptively meet charging power demands of the Mp DC fast-charging electric vehicles, each AC charging point is enabled to adaptively meet charging power demands of the MA AC slow-charging electric vehicles, each AC and DC-hybrid charging point is enabled to meet the charging power demands of the Mp DC fast-charging electric vehicles and the MA AC slow- charging electric vehicles, and one charging point is enabled to provide charging service for only one electric vehicle at a time; D 412... Jot
[0076] [0076] the jth DC charging point is denoted as CS; Jen il 2 JD} gnd Pip represents a set of codes of the DC charging points, thereby denoting the Ip DC : : CSP, C83, CSP, CS : : , charging points as toe J> > JD respectively; the jth AC charging point A =f1 2...
[0077] [0077] the charging power demand of the mth DC charging electric vehicle is
[0078] [0078] the charging power demand of the mth AC charging electric vehicle is
[0079] [0079] K is set as the maximum period number in one day, a corresponding total time length is T, a power-grid peak regulation electricity price at any epoch t under the total time length T is denoted as PR, yuan/KWH, PR. Der and PPR is a limited electricity-price state space; it is assumed that the power-grid peak regulation electricity price is periodically issued according to a dispatching instruction, and 7% is the epoch when the kth peak regulation electricity price PR, is issued, the price is maintained to the epoch ™+1 when the next peak regulation electricity price is issued; that is, PR‘ = PR tu<t<m k=012K-1 gnd T0=0 ‚ a peak regulation electricity Tk, PR Ik =0,1,2,- K—115=0 price sequence is denoted as {( k J) 9 } , Wherein PR, ep. to =T and PR: PR - PR.
[0080] [0080] the charging station provides paid charging service, and the price of the charging service of the charging station is PRoy yuan/KWH; PRoy is at least less than
[0081] [0081] the event that the th electric vehicle with the battery having the state SOC, (t) : of charge (SOC) «+7 at the epoch t randomly arrives at the charging station to apply for the charging service, is denoted as an arrival event E(mt, SOC, (D) and mg Pu, U Pu, :
[0082] [0082] the service state of the jth DC charging point at the epoch t is denoted as CSP (t) = (m7 (t) SOC» (t il | U my’ ( )). thereby denoting the combined state of the JD DC CSP =(CSP (t), CSP (t) ‚CSP (t) CSP (t charging points at the epoch t as ' | ' | ) : | ) ! | ) vo ). D | m; (f) represents the type of the electric vehicle which is served by the jth DC charging D Dey = point CS at the epoch t, mj (1) 0 indicates that no vehicle is admitted at the jth DC
[0083] [0083] the service state of the jth AC charging point at the epoch t is denoted as Cf (t) = (mi (t),S0C (t 5) | 0 mj | ) thereby denoting the combined state of the J4 AC CSP =(CSt*(1),CS3 (1), CSP (t) CSP (t charging points at the epoch t as ! | ) : | ) ! | ) nl ).
[0084] [0084] the service state of the jth AC and DC-hybrid charging point at the epoch CS) (t) = (mPP (t),S0C 0 (t tis denoted as | ) | ! ( ) mj )) thereby denoting the combined state of the JAD AC and DC-hybrid charging points at the epoch t as CSP = (CSD (t) CSP (t})-- CSP (t)-- CSP (t AD ' | ! | ) : | ) ! | ) pa ). mj (B) represents the type of the electric vehicle which is served by the jth AC and DC-hybrid charging point
[0085] [0085] the combined state of the three types of charging points at the epoch t 1s C, ={CSP, CSP, CSP denoted as | | ' | J
[0086] [0086] the state of the service system of the charging station at any epoch t is denoted as ie CPR}.
[0087] [0087] it is assumed that the DC charging electric vehicle admitted into the service in the service system of the charging station does not participate in peak regulation, and the AC charging electric vehicle admitted into the service may participate in peak regulation; it is assumed that discharge power of one AC charging electric vehicle is equal to charging power, and the discharge reward per unit time per unit discharge power at any epoch is equal to a real-time power-grid electricity price;
[0088] [0088] the epoch 7% of issuing the kth peak regulation electricity price PR, is taken as a decision-making epoch for peak regulation of the control center for peak regulation response, and the energy exchange directions of all the AC charging electric vehicles admitted by the service system of the charging station are denoted as actions di at the decision-making epoch for peak regulation, de = {(df (1). 02 (2) (dt (a), (dl (1), dL (2). ++. a (3). ++. dE (Tan) )}
[0089] [0089] at the kth decision-making epoch 7% for peak regulation, in the jth AC AD AD. AD and DC-hybrid charging point, if ™ (1) € Py, , de (J) , and if ™ ()=0 , AD {: . A Ar: di” (3) 0 and © Pro, in the jth AC charging point, if my (f) 0 di (9) , and Jje@y,
[0090] [0090] at the kth decision-making epoch 7% for peak regulation, a set of feasible Ak peak regulation control actions dk of the charging station is denoted as Dr ‚ and ko _ Dr © Dy , wherein D: is a Cartesian product of Ja+JaD gets Dr , le, Dy =D xD x--xDr . Dr jg 4 Cartesian product of Ja +JAD gets Pr of peak regulation control actions; the total number of actions in the set Dr is denoted as C ;
[0091] [0091] all the actions in Dr are encoded, de) is set to represent the cth action, and d(e) Dr, 0 =1,2,C. E(m;, SOC, (t
[0092] [0092] the occurrence epoch t of the arrival event ( te wm, (1) ) of the Mt th electric vehicle is taken as a decision-making epoch for admission control of the admission control center for the electric vehicle, and event information of the current epoch and current state information of the service system of the charging station are : =Ít CPR, mt, SOC mn, (t)} combined and denoted as an event-extended state St rb Ao MG m (1) s: st Ta ie t=T
[0093] [0093] the epoch when the nth event >! occurs is denoted as 'n ie, tn, and a corresponding peak regulation period for the power-grid electricity price is denoted as [Eke Trott): Kn E10, KB Ty em, Tk).
[0094] [0094] a change interval [0°11 of the SOC of the battery of the electric vehicle is discretized by using a smaller constant 8 to obtain a discretization event-extended © _ C e state 5n = {kn, Cn, PR, my, SOC m, (D)} corresponding to St, wherein n represents a
[0095] [0095] the state space formed by all possible discretization event-extended states € isdenoted as Pie, Sn € D and the total number of the discretization event-extended states of the system is denoted as S; €
[0096] [0096] all the possible discretization event-extended states are encoded, Sn(S) ¢ —_— … represents the Sth discretization event-extended state, and sa(s)e @,s=1.2,--S ; a set of all possible discretization event-extended states where a DC-charging-electric-vehicle arriving event occurs and all the DC charging points and all the AC and DC-hybrid
[0097] [0097] under the same discretization rule, the discretization state corresponding to the state St of the the service system of the charging station at any epoch t is denoted as Sk, 5k = tk, Cx, PR} , and Ck and Cn have a consistent value space; the state space formed by all the possible discretization states of the service system of the charging station is denoted as © i.e, Sk € © ‚ and the total number of the discretization states of the system is denoted as S.
[0098] [0098] all the possible discretization states of the service system of the charging station are encoded, 5 (5) represents the S th discretization state, and se(8)e®,5=1,2.---.S.
[0099] [0099] the decision-making epoch for admission control of the system is defined as the arrival epoch of any electric vehicle, i.e., the event occurrence epoch;
[00100] [00100] whether the service system of the charging station admits the charging request of the electric vehicle which arrives randomly and provides the charging service is taken as an admission control action a, and the action at the nth decision-making epoch ~ . . =o. 1} .
[00101] [00101] at any decision-making epoch Tu for admission control, if the type my € Puy of the arriving electric vehicle, all the DC charging points and all the AC and D . | Co Im Oe [je ©, | DC-hybrid charging points are in service, i.e, ! > ") and {mf (0 € Dx, UD, [ied | =0 . m € P co. ’ aa VV: jf the type ! Ma of the arriving electric vehicle, all the AC charging points and all the AC and DC-hybrid charging points A AD . — fm] (t)e Dun, j € ®,,} fm] (t) e Dy, U Oyj ed, } are in service, i.e, , and , a, =0.
[00102] [00102] at the D th decision-making epoch Ta for admission control, if my € Du and an =1 , the arriving DC charging electric vehicle is preferentially admitted into any idle DC charging point and charged immediately; if ™ © PMs and
[00103] [00103] the cooperative-optimization control method of the charging station 1s divided into Q-learning control of the electric-vehicle admission control center and Q- learning control of the control center for peak regulation response;
[00104] [00104] as shown in Fig. 1, the Q-learning control method of the electric-vehicle admission control center of the charging station includes the following steps:
[00105] [00105] step 1: defining and initializing an exploration rate of the admission control action at the nth decision-making epoch In for admission control as En, and letting 9 <8 <1 for example, letting En = 0.8:
[00106] [00106] defining elements in a Q-value table for admission control as discretization event-extended state-action pair learning values, and initializing the elements in the Q-value table for admission control, for example, randomly initializing the value of each element to be O0 or making it be 0, wherein the Q-value table for admission control takes the discretization event-extended state of the system at the time of the event as a row and the admission action of the system as a column, i.e, Q(sa(1),0) Q(sa(1),]) Q(sn(2),9) Q(sn(2), 1) Q(sn(s),0) Q(sa(s), 1) D o Q(s:(S),0) Q(s;(S), 1) , and if su(s) € Dy Uap , s=12,--,8 i Q(s5(s),1) is a negative infinite value;
[00107] [00107] defining a current greedy control strategy table v as an action set formed by actions corresponding to the maximum discretization event-extended state-action pair learning value of each row in the Q-value table for admission control;
[00108] [00108] step 2: initializing variables t=9 and n=1 assigning the current exploration rate n for the admission control action to 1; letting an original strategy table YO TV;
[00109] [00109] step 3: at the Nth decision-making epoch Ta of the service system of the
[00110] [00110] denoting the discretization state corresponding to the current event- © ce extended state St of the Nth decision-making epoch Ta in the Q-value table as Sn;
[00111] [00111] denoting the admission control action which is actually taken in the € © current event-extended state St at the Nth decision-making epoch Tn as vise) wherein v(st) € Da. €
[00112] [00112] in the event-extended state St at the nth decision-making epoch Tn for e ec D A admission control, if the corresponding discretization state 5» meets Sn © D; UD, , letting “* ‚ otherwise, in the current event-extended state !, extracting a greedy € c action in the discretization state Sn corresponding to St from the Q-value table, © © ie denoting the greedy action as Sn), assigning Y(n) to (51) with a probability I —2n Cc , selecting an action other than V(51) from the action set Pa at the exploration rate ên 7 . . . © as an exploration action Ven and assigning the action to v(st) ; ©
[00113] [00113] after the service system of the charging station takes the action vis)
[00114] [00114] step 4: observing the service system of the charging station, and with Equ. … Tst, v(st),s5) (1), calculating the combined quantity 6 t° of accumulated charging rewards and peak regulation rewards obtained in the state transition process of the system from © ¢ = 1 . the current action Y(t) taking state *! LE PR, m4, SOC, (OF gt the n th decision- e _ ' , ¢ making epoch In for admission control to the state 5! © {t, Co, PR, my, SOC, (t)} atthe ntlth decision-making epoch Ti+t for admission control or the epoch T;
[00116] [00116] wherein in Equ. (1), ¥ = M2 Tue. 15. it js defined that when ™ (t)=0 Dy D . . D N A — sgn(m; (t))=0 and when 7 (t)>0 sgn(m; (t)) =1 . when Mi (t)=0 An _ Ace Ar _ AD sgn(m;'(t})) =0 and when Mi (t)>0 ’ sgn(mj (t)) =1 . when Mi ED, ’ AD Ny AD _ AD Ady, (MG (t)) =1 otherwise, Zo, (mj (1))=0 . when mj (t) € Du, ’ AD u AD u pP PA = . mj =0 p A Hoang, (M(H) L otherwise, Kou, (Mj (0) ‚ MID and PD represent the
[00117] [00117] step 5: updating the discretization event-extended state-action pair e /.e e © learning value 952: V(S) for taking the action YOU) in the discretization state Sn € corresponding to 3 in the Q-value table for admission control by using a difference formula and a Q-value updating formula shown in Equ. (2) and Equ. (3), obtaining the e 1e updated learning value and assigning it to Q(sn, v{s:)). d(st,v(st),st) =r(st,v(st),st) + max Q(s51,8) = Qlsn, vst)
[00118] [00118] asDa (2)
[00119] [00119] Q(sn, v(st)): = Q(sn, v(st)) + (sh, v(sr))d(st, v(si), 57) (3) oo Q(sc1.a) me
[00120] [00120] wherein in Equ. (2), “2: represents the discretization event- extended state-action pair learning value for taking the action 2 in the discretization e € state Sn+ corresponding to the state St of transition to the n+1 th decision-making epoch Tost for admission control or the epoch T;
[00121] [00121] in Equ. (3), the operator ": =" indicates that the value of the right formula ; 7 ‚able: 8 VSO) is a learn is calculated first and then given to the left variable; > "tJ is a learning step length qe € for taking the action V(St) in the discretization event-extended state Sn at the nth decision-making epoch In for admission control;
[00122] [00122] step 6: selecting the action corresponding to the maximum discretization event-extended state-action pair learning value of each row in the updated Q-value table for admission control to form the current action set for admission control, taking the current action set as the updated greedy strategy table for admission control, and assigning it to the current greedy strategy v for admission control; degrading the exploration rate ©n, thereby obtaining the updated exploration rate and assigning it to En+1 ;
[00123] [00123] step 7: if '<T assigning n+1 to n and returning to the step 3; otherwise, indicating t= T and performing step 8; and
[00124] [00124] step 8: judging whether the strategy table V for admission control is equal to Y9 or not, if so, stopping updating and performing admission control on the random charging service requests of the M electric vehicles with the current strategy table v for admission control, otherwise, returning to the step 2 for execution.
[00125] [00125] As shown in Fig. 2, the Q-learning control method of the control center for peak regulation of the charging station includes the following steps:
[00126] [00126] step -1: defining and initializing an exploration rate of the peak regulation control action at the kth decision-making epoch 7x for peak regulation control as êx, and letting 0 <€x <1 for example, letting ek = 0.9.
[00127] [00127] defining elements in a Q-value table for peak regulation control as state- action pair learning values of the service system of the charging station, and initializing the elements in the Q-value table for peak regulation control, for example, randomly initializing the value of each element to be O or making it be 0, wherein the Q-value table for peak regulation control takes the discretization state of the service system of the charging station as a row and the peak regulation control action of the system as a
[00128] [00128] defining a current greedy strategy table V for peak regulation control as an action set formed by actions corresponding to the maximum discretization state- action pair learning value of each row in the Q-value table for peak regulation control,
[00129] [00129] step -2: initializing t= 9 and k=0. assigning the current exploration rate €k for the peak regulation control action to go. setting an original greedy strategy table for peak regulation control as vozV ;
[00130] [00130] step -3: at the kth decision-making epoch 7% for peak regulation control of the service system of the charging station, observing the current state >t of the service system;
[00131] [00131] denoting the discretization state corresponding to the system state St of the kth decision-making epoch 7% for peak regulation control in the Q-value table for peak regulation control as Sk;
[00132] [00132] denoting the peak regulation control action which is actually taken in the system state St at the kth decision-making epoch Tk for peak regulation control as Ws) ‚ wherein Us) € Dr.
[00133] [00133] in the system state St at the kth decision-making epoch °k for peak regulation control, extracting a greedy action in the discretization state Sk corresponding to St from the Q-value table for peak regulation control and denoting it as (sk),
[00134] [00134] in the system state St at the kth decision-making epoch 7% for peak regulation control, randomly selecting an action Va, from the current feasible action set
[00135] [00135] after the control center for peak regulation of the charging station takes LY . . i» St, V(s¢), St the action vs) , observing and obtaining a system transition sample track | 0 Vs st ) transited from the kth decision-making epoch 7x for peak regulation control to the (k+1)th decision-making epoch Tk+! for peak regulation control, wherein '= Tk and t' = Tk+1 ;
[00136] [00136] step -4: observing the service system of the charging station, and with i i oo TS, (80). 81) ; Equ. (4), calculating the combined quantity of charging rewards and peak regulation rewards obtained in the state transition process of the system from the current Low . — 1 . . action V0) taking state St = CPR} ot the kth decision-making epoch Tk for peak . S= ft Ca 1 we regulation control to the state St WC PR} gt the (k+1)th decision-making epoch Tk+1 for peak regulation control; > sen(my (D)P pat tb (m2 OP lo Hs. (sr), Sr) = I. OI sgn(mf (0)di (Py 2 ing, (mj (0)ALD (Prog,
[00137] [00137] {PR — PR )dt (4)
[00138] [00138] step -5: updating the discretization state-action pair learning value Qs, ¥(50)) for taking the action vs) in the discretization state Sk corresponding to St in the Q-value table for peak regulation control by using a difference formula and a Q-value updating formula shown in Equ. (5) and Equ. (6), obtaining the updated learning value and assigning it to Qs, Vist). ds Us.) = 150, 950), 50) + max Alsi, d) - Qs, Hs)
[00139] [00139] deD} (5)
[00140] [00140] Qs. V(s1)): = Olst, V(st)) + v(sk, V(s))d(s. Vs). st) (6)
[00141] [00141] wherein in Equ. (5), Q(sk-1,d) represents the discretization state-action pair learning value for taking the feasible action d in the discretization state Sk © corresponding to the state Sy of transition of the system to the (k+1)th decision-making
[00142] [00142] in Equ. (6), the operator "* =" indicates that the value of the right formula is calculated first and then given to the left variable; YY) is a learning step length for taking the action VS) in the discretization state Sk at the kth decision-making epoch Tk for peak regulation control;
[00143] [00143] step -6: selecting the action corresponding to the maximum discretization state-action pair learning value of each row in the updated Q-value table for peak regulation control to form the current action set for peak regulation control, taking the current action set as the updated greedy strategy table for peak regulation control, and assigning it to the current greedy strategy V for peak regulation control; degrading the exploration rate Ek, thereby obtaining the updated exploration rate and assigning it to Ek ;
[00144] [00144] step -7: if k <K assigning k+1 to k, and then returning to the step -3; otherwise, performing step -8; and
[00145] [00145] step -8: judging whether the strategy table V for peak regulation control is equal to YO or not, if so, stopping updating and performing peak regulation control on the AC charging electric vehicles served by the charging station with the current greedy strategy table V for peak regulation control, otherwise, returning to the step -2 for execution.

权利要求:
Claims (1)
[1]
-29. Conclusions l.
Control method for cooperative optimization of a charging station based on a dual center Q learning method, wherein the cooperative optimization control method for a charging station based on the dual center Q learning method is divided into electric vehicle admission control and peak regulation response control; in the admission scheme of an electric vehicle, a control process of charging service requests of a DC charging electric vehicle and an AC charging electric vehicle arriving randomly is described as an event-driven decision process, wherein the arrival of the electric vehicles and making the service requests as an event where a peak regulation electricity price and the online service state of a charge point are taken as the state of a service system of the charging station, where, when the event occurs, event information and state information of the service system are combined in an event extended state, where either the electric vehicle is allowed or not and whether or not it is provided with a charging service is selected as an admission control action, thus obtaining sample data feedback, and whereby a Q value table for r admission scheme is updated with a Q learning method, and finally an admission scheme strategy table is obtained; in the peak regulation response scheme, a process of controlling the electric vehicle being charged in the charging station responding to a power network peak regulation electricity price plan is described as a sequential decision process, and wherein in the time period when the peak regulation electricity price is issued, charging and discharging actions of all AC charging electric vehicles served are selected as peak regulation control actions according to the state of the charging station's service system, thus obtaining sample data feedback, updating a Q value table for peak regulation control with a Q learning method, and finally strategy table for peak regulation control is obtained.

类似技术:

公开号 | 公开日 | 专利标题

EP2465177B1|2015-04-29|Controlling charging stations

CN105740556B|2019-04-05|The automatic preparation method of route map of train based on passenger flow demand

CN106828161B|2019-03-26|One kind being applied to multichannel charging jack charging equipment of electric automobile and its control method

CN102437601B|2013-12-18|Autonomous charging system of cloud robot and method thereof

JP2013520955A|2013-06-06|System, apparatus and method for exchanging energy with an electric vehicle

CN103915869A|2014-07-09|Electric car intelligent charging system and method on basis of mobile device

CN104022552B|2016-08-31|A kind of intelligent detecting method controlled for charging electric vehicle

Zhang et al.2018|Comprehensive optimization of urban rail transit timetable by minimizing total travel times under time-dependent passenger demand and congested conditions

JP3181656U|2013-02-21|Intelligent power control system for long-distance nodes of solar power street lights

CN103997044A|2014-08-20|Power load control method and system

CN106849109B|2019-06-25|A kind of urban distribution network load control method for the access of scale charging pile

CN102708425A|2012-10-03|Coordination control system and method for electric vehicle service network based on Multi-Agent system

CN111376954B|2020-09-29|Train autonomous scheduling method and system

CN107139777B|2019-08-09|A kind of vehicle energy management method and its system

Xia et al.2019|A fuzzy control model based on BP neural network arithmetic for optimal control of smart city facilities

Chen et al.2016|Real-time bus holding control on a transit corridor based on multi-agent reinforcement learning

NL2026738A|2021-08-18|Cooperative-optimization control method of charging station based on double-center q-learning method

Lu et al.2019|Integrated route planning algorithm based on spot price and classified travel objectives for EV users

CN111369181A|2020-07-03|Train autonomous scheduling deep reinforcement learning method and module

CN101702537A|2010-05-05|Method for processing failures on adaptive basis in terminal of distribution network

CN107609712A|2018-01-19|A kind of steer method and apparatus based on classification prediction

CN109193652B|2020-03-24|Situation awareness-based power distribution network fault self-healing system with distributed power sources

Solanke et al.2021|Control and management of a multilevel electric vehicles infrastructure integrated with distributed resources: A comprehensive review

CN105809987B|2019-07-09|It is a kind of based on the wind light mutual complementing formula intelligent traffic light control system more acted on behalf of

WO2021248607A1|2021-12-16|Deep reinforcement learning-based taxi dispatching method and system

同族专利:

公开号 | 公开日

CN110991931A|2020-04-10|

引用文献:

公开号 | 申请日 | 公开日 | 申请人 | 专利标题

CN110428165A|2019-07-31|2019-11-08|电子科技大学|The electric car charging schedule method of reservation and queuing is taken into account in a kind of charging station|

法律状态:

优先权:

申请号 | 申请日 | 专利标题

CN201911316131.XA|CN110991931B|2019-12-19|Charging station cooperative optimization control method based on double-center Q learning|

[返回顶部]